Show the code
source("utils.R")
theme_set(theme_minimal())This website is still under active development - all content subject to change
For this representation of cells, we will rely on the SpatialFeatureExperiment package. For preprocessing of the dataset, we refer the reader to the vignette of the voyager package.
class: SpatialFeatureExperiment
dim: 980 100290
metadata(0):
assays(1): counts
rownames(980): AATK ABL1 ... NegPrb22 NegPrb23
rowData names(3): means vars cv2
colnames(100290): 1_1 1_2 ... 30_4759 30_4760
colData names(17): Area AspectRatio ... nCounts nGenes
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
spatialCoords names(2) : CenterX_global_px CenterY_global_px
imgData names(1): sample_id
unit: full_res_image_pixels
Geometries:
colGeometries: centroids (POINT), cellSeg (POLYGON)
Graphs:
sample01:
[1] 20
In this vignette, we will show the metrics related a ligand-receptor pair, CEACAM6 and EGFR which was identified in the original publication of the CosMx dataset (He et al. 2022).
One of the challenges when working with (irregular) lattice data is the construction of a neighbourhood graph (Pebesma and Bivand 2023). The main question is, what to consider as neighbours, as this will affect downstream analyses. Various methods exist to create neighbours, such as contiguitiy based neighbours (neighbours in direct contact), graph-based neighbours (e.g., \(k\)-nearest neighbours), distance based neighbours or higher order neighbours (Getis 2009; Zuur, Ieno, and Smith 2007; Pebesma and Bivand 2023). The documentation of the package spdep gives an overview of the different methods.
Segmentation of individual cells is challenging (Wang 2019) and construction of contiguity-based neighbours based on individual cell segmentation assumes very accurate segmentation results. Furthermore it would neglect the influence of more distant, not directly adjacent neighbours, which based on the feature of interest might not be the correct assumption.
In an irregular lattice, the task of finding a spatial weight matrix is more complex, as different options exist. One option is to base the neighbourhood graph on neighbours that are in direct contact with each other (contiguous), as implemented in the poly2nb method. As cell segmentation is notoriously imperfect, we add a snap value, which means that we consider all cells with distance 20 or less as contiguous.
Alternatively, we can use a k-nearest neighbours approach. The the number \(k\) is somewhat arbitrary.
The graphs below show noticeable differences. In the contiguous neighbour graph on the left (neighbours in direct contact), we can see the formation of distinct patches that are not connected to the rest of the tissue. In addition some cells don’t have any direct neighbours. In contrast, the k-nearest neighbours (kNN) graph on the right reveals that these patches tend to be connected to the rest of the structure.
Here we set the arguments for the examples below.
For two continous observation the global bivariate Moran’s I is defined as (Wartenberg 1985; Bivand 2022)
\[I_B = \frac{\Sigma_i(\Sigma_j{w_{ij}y_j\times x_i})}{\Sigma_i{x_i^2}}\]
where \(x_i\) and \(y_i\) are the two variables of interest and \(w_{ij}\) is the value of the spatial weights matrix for positions \(i\) and \(j\).
The global bivariate Moran’s I is a measure of autocorrelation of the variables \(x\) and \(y\) with the spatial lag of \(y\). Therefore the result might overestimate the spatial autocorrelation of the variables due to the inherent (non-spatial) correlation of \(x\) and \(y\) (Bivand 2022).
spdedBOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 499 bootstrap replicates
CALL :
boot::boot.ci(boot.out = res_xy, conf = c(0.99, 0.95, 0.9), type = "basic")
Intervals :
Level Basic
99% (-0.3232, -0.3087 )
95% (-0.3213, -0.3101 )
90% (-0.3204, -0.3110 )
Calculations and Intervals on Original Scale
Some basic intervals may be unstable
From the result of the global measure, the overall spatial autocorrelation of the two genes is not significant.
Lee’s L is a bivariate measure that combines non-spatial pearson correlation with spatial autocorrelation via Moran’s I (Lee 2001). This enables us to asses the spatial dependence of two continuous variables in a single measure. The measure is defined as
\[L(x,y) = \frac{n}{\sum_{i=1}^n(\sum_{j=1}^nw_{ij})^2}\frac{\sum_{i=1}^n(\sum_{j=1}^nw_{ij}(x_i-\bar{x}))(\sum_{j=1}^nw_{ij}(y_j-\bar{y}))}{\sqrt{\sum_{i=1}^nw_{ij}(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^nw_{ij}(y_i-\bar{y})^2}},\]
where \(w_{ij}\) is the value of the spatial weights matrix for positions \(i\) and \(j\), \(x\) and \(y\) the two variables and \(\bar{x}\) and \(\bar{y}\) their means.
voyagerSimilar to the global variant of Lee’s L the local variant (Lee 2001) is defined as
\[L_i(x,y) = \frac{\sum_{i=1}^n(\sum_{j=1}^nw_{ij}(x_i-\bar{x}))(\sum_{j=1}^nw_{ij}(y_j-\bar{y}))}{\sqrt{\sum_{i=1}^nw_{ij}(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^nw_{ij}(y_i-\bar{y})^2}},\] Local Lee’s L is a measure of spatial co-expression, when the variables of interest are gene expression measurements and can also be a metric of co-localization. Unlike the gobal version, the variables are not averaged and show the local contribution to the metric. Positive values indicate colocalization, negative values indicate segregation.
This can be interesting in the context of detection of coexpressed ligand-receptor pairs. A method that is based on bivariate Moran’s I and tries to detect such pairs is SpatialDM (Li et al. 2023).
voyagersfe_tissue <- runBivariate(sfe, type = "locallee",
feature1 = features[1], feature2 = features[2],
colGraphName = colGraphName)
plotLocalResult(sfe_tissue, "locallee",
features = localResultFeatures(sfe_tissue, "locallee"),
ncol = 2, divergent = TRUE, diverge_center = 0,
colGeometryName = colGeometryName) Geary’s C is a measure of spatial autocorrelation that is based on the difference between a variable and its neighbours. (Anselin 2019) defines it as
\[C_i = \sum_{j=1}^n w_{ij}(z_i-z_j)^2,\]
and can be generalized to \(k\) parameters by expanding
\[c_{k,i} = \sum_{v=1}^k c_{v,i}\]
where \(c_{v,i}\) is the local Geary’s C for the \(v\)th variable at location \(i\). The number of variables that can be used is not fixed, which makes the interpretation a bit more difficult. In general, the metric summarizes similarity in the “multivariate attribute space” (i.e. the gene expression) to its geographic neighbours. The common difficulty in these analyses is the interpretation of the mixture of similarity in the geographic space and similarity in the attribute space.
To speed up computation we will use highly variable genes.
We can further plot the results of the permutation test. Significant values indicate interesting regions, but should be interpreted with care for various reasons. For example, we are looking for similarity in a combination of multiple values but the exact combination is not known. Anselin (2019) write “Overall, however, the statistic indicates a combination of the notion of distance in multi-attribute space with that of geographic neighbors. This is the essence of any spatial autocorrelation statistic. It is also the trade-off encountered in spatially constrained multivariate clustering methods (for a recent discussion, see, e.g., Grubesic, Wei, and Murray 2014).”. Multi-attribute space refers here to the highly variable genes. The problem comes down to where the similarity comes from, the gene expression or the physical space. The same problem is common in spatial domain detection methods.
This test is useful to assess the overlap of the k-nearest neighbours from physical distances (tissue space) with the k-nearest neighbours from the gene expression measurements (attribute space). For both physical and attribute space k-nearest neighbor matrix is computed. In a second step the probability of an overlap between the two matrices have in common (Anselin and Li 2020).
Cardinality is a measure of how many neighbours of the two matrices are in common. Some regions show high cardinality with low probability therefore share similarity on both attribute and physical space. In contrast to multivariate local Geary’s C this metric focuses directly on the distances and not on a weighted average. A problem of this approach is called the empty space problem which states that as the number of dimensions of the feature sets increase, the empty space between observations also increases (Anselin and Li 2020).
In addition to measures of spatial autocorrelation of continuous data as seen above, there exist a method that apply the same concept to binary and categorical data, joint count statistics. In essence the joint count statistic compares the distribution of categorical marks in a lattice with frequencies that would occur randomly. These random occurrences can be computed using a theoretical approximation or random permutations. The same concept was also extended in a multivariate setting with more than two categories. The corresponding spdep function are called joincount.test and joincount.multi Bivand (2022).
The local methods presented above should always be interpreted with care, since we face the problem of multiple testing when calculating them for each cell. Moreover, the presented methods should mainly serve as exploratory measures to identify interesting regions in the data. Multiple processes can lead to the same pattern, thus from identifying the pattern we cannot infer the underlying process. Indication of clustering does not explain why this occurs. On the one hand, clustering can be the result of spatial interaction between the variables of interest. We have an accumulation of a gene of interest in one region of the tissue. On the other hand clustering can be the result spatial heterogeneity, when local similarity is created by structural heterogeneity in the tissue, e.g., that cells with uniform expression of a gene of interest are grouped together which then creates the apparent clustering of the gene expression measurement.
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.5
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Zurich
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] magrittr_2.0.3 stringr_1.5.0
[3] dixon_0.0-8 splancs_2.01-44
[5] spdep_1.2-8 spData_2.3.0
[7] tmap_3.3-4 scater_1.28.0
[9] scran_1.28.2 scuttle_1.10.3
[11] SFEData_1.2.0 SpatialFeatureExperiment_1.2.3
[13] Voyager_1.2.7 rgeoda_0.0.10-4
[15] digest_0.6.33 ncf_1.3-2
[17] sf_1.0-16 reshape2_1.4.4
[19] patchwork_1.2.0 STexampleData_1.8.0
[21] ExperimentHub_2.8.1 AnnotationHub_3.8.0
[23] BiocFileCache_2.8.0 dbplyr_2.3.4
[25] RANN_2.6.1 seg_0.5-7
[27] sp_2.1-1 rlang_1.1.1
[29] ggplot2_3.5.1 dplyr_1.1.3
[31] mixR_0.2.0 spatstat_3.0-6
[33] spatstat.linnet_3.1-1 spatstat.model_3.2-6
[35] rpart_4.1.19 spatstat.explore_3.2-3
[37] nlme_3.1-162 spatstat.random_3.1-6
[39] spatstat.geom_3.2-5 spatstat.data_3.0-1
[41] SpatialExperiment_1.10.0 SingleCellExperiment_1.22.0
[43] SummarizedExperiment_1.30.2 Biobase_2.60.0
[45] GenomicRanges_1.52.1 GenomeInfoDb_1.36.4
[47] IRanges_2.34.1 S4Vectors_0.38.2
[49] BiocGenerics_0.46.0 MatrixGenerics_1.12.3
[51] matrixStats_1.0.0
loaded via a namespace (and not attached):
[1] spatstat.sparse_3.0-2 bitops_1.0-7
[3] httr_1.4.7 RColorBrewer_1.1-3
[5] tools_4.3.1 utf8_1.2.3
[7] R6_2.5.1 HDF5Array_1.28.1
[9] mgcv_1.9-1 rhdf5filters_1.12.1
[11] withr_2.5.1 gridExtra_2.3
[13] leaflet_2.2.0 leafem_0.2.3
[15] cli_3.6.1 labeling_0.4.3
[17] proxy_0.4-27 R.utils_2.12.2
[19] dichromat_2.0-0.1 scico_1.5.0
[21] limma_3.56.2 rstudioapi_0.15.0
[23] RSQLite_2.3.1 generics_0.1.3
[25] crosstalk_1.2.0 Matrix_1.5-4.1
[27] ggbeeswarm_0.7.2 fansi_1.0.5
[29] abind_1.4-5 R.methodsS3_1.8.2
[31] terra_1.7-55 lifecycle_1.0.3
[33] yaml_2.3.7 edgeR_3.42.4
[35] rhdf5_2.44.0 tmaptools_3.1-1
[37] grid_4.3.1 blob_1.2.4
[39] promises_1.2.1 dqrng_0.3.1
[41] crayon_1.5.2 lattice_0.21-8
[43] beachmat_2.16.0 KEGGREST_1.40.1
[45] magick_2.8.0 pillar_1.9.0
[47] knitr_1.44 metapod_1.7.0
[49] rjson_0.2.21 boot_1.3-28.1
[51] codetools_0.2-19 wk_0.8.0
[53] glue_1.6.2 vctrs_0.6.4
[55] png_0.1-8 gtable_0.3.4
[57] cachem_1.0.8 xfun_0.40
[59] S4Arrays_1.0.6 mime_0.12
[61] DropletUtils_1.20.0 units_0.8-4
[63] statmod_1.5.0 bluster_1.10.0
[65] interactiveDisplayBase_1.38.0 ellipsis_0.3.2
[67] bit64_4.0.5 filelock_1.0.2
[69] irlba_2.3.5.1 vipor_0.4.5
[71] KernSmooth_2.23-21 colorspace_2.1-0
[73] DBI_1.1.3 raster_3.6-26
[75] tidyselect_1.2.0 bit_4.0.5
[77] compiler_4.3.1 curl_5.1.0
[79] BiocNeighbors_1.18.0 DelayedArray_0.26.7
[81] scales_1.3.0 classInt_0.4-10
[83] rappdirs_0.3.3 goftest_1.2-3
[85] spatstat.utils_3.0-5 rmarkdown_2.25
[87] XVector_0.40.0 htmltools_0.5.6.1
[89] pkgconfig_2.0.3 base64enc_0.1-3
[91] sparseMatrixStats_1.12.2 fastmap_1.1.1
[93] htmlwidgets_1.6.2 shiny_1.7.5.1
[95] DelayedMatrixStats_1.22.6 farver_2.1.1
[97] jsonlite_1.8.7 BiocParallel_1.34.2
[99] R.oo_1.25.0 BiocSingular_1.16.0
[101] RCurl_1.98-1.12 GenomeInfoDbData_1.2.10
[103] s2_1.1.4 Rhdf5lib_1.22.1
[105] munsell_0.5.0 Rcpp_1.0.11
[107] ggnewscale_0.4.9 viridis_0.6.4
[109] stringi_1.7.12 leafsync_0.1.0
[111] zlibbioc_1.46.0 plyr_1.8.9
[113] parallel_4.3.1 ggrepel_0.9.4
[115] deldir_1.0-9 Biostrings_2.68.1
[117] stars_0.6-4 splines_4.3.1
[119] tensor_1.5 locfit_1.5-9.8
[121] igraph_1.5.1 ScaledMatrix_1.8.1
[123] BiocVersion_3.17.1 XML_3.99-0.14
[125] evaluate_0.22 BiocManager_1.30.22
[127] httpuv_1.6.11 purrr_1.0.2
[129] polyclip_1.10-6 scattermore_1.2
[131] rsvd_1.0.5 lwgeom_0.2-13
[133] xtable_1.8-4 e1071_1.7-13
[135] RSpectra_0.16-1 later_1.3.1
[137] viridisLite_0.4.2 class_7.3-22
[139] tibble_3.2.1 memoise_2.0.1
[141] beeswarm_0.4.0 AnnotationDbi_1.62.2
[143] cluster_2.1.4
©2024 The pasta authors. Content is published under Creative Commons CC-BY-4.0 License for the text and GPL-3 License for any code.